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Abstract. 

The Dark Energy Survey Data Management (DESDM) system will process 
and archive the data from the Dark Energy Survey (DES) over the five year 
period of operation. This paper focuses on a new adaptable processing frame- 
work developed to perform highly automated, high performance data parallel 
processing. The new processing framework has been used to process 45 nights 
of simulated DECam supernova imaging data, and was extensively used in the 
DES Data Challenge 4, where it was used to process thousands of square degrees 
of simulated DES data. 



1. Introduction 

The Dark Energy Survey (DES, 2011-2016) is an optical survey of 5000 deg2 of 
the South Galactic Cap to ~24th magnitude in multiple filter bands (grizY) using 
a new wide field CCD camera, DECam, mounted on the Blanco 4-m telescope 
at Cerro Tololo Inter- American Observatory (CTIO). The DECam is a large 
focal plane array with a short readout time which will collect approximately 
300 GB of science images per night of observation. Additional data products 
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Figure 1. DESDM Overview. 



will be generated through a series of image processing steps resulting in a total 
of approximately 3 TB (uncompressed) of data for each night. 

The Dark Energy Survey Data Management system (DESDIVQ) (see Fig- 
ure d]) will process and archive the data from the Dark Energy Survey (DES) 
over the five year period of operation. This paper focuses on a new adaptable 
processing framework developed to perform highly automated, high performance 
data parallel processing. The new processing framework has been used to pro- 
cess 45 nights of simulated DECam supernova imaging data, and was extensively 
used in the DES Data Challenge 4, where it was used to process thousands of 
square degrees of simulated DES data. 



2. Processing Framework 

The processing framework consists of two main components: (1) modular pipelines 
that execute science codes and (2) an orchestration layer to manage the pipelines. 
The DESDM pipelines make use of application containers to wrap astronomi- 
cal pipeline modules to be executed on the compute nodes of HPC resources. 
These application containers send events to the Notification service for viewing 
by the operator. The orchestration layer of the DESDM processing framework is 
responsible for preparing job descriptions (using operator input parameters cou- 
pled with the results of database queries), deploying required files and data to 
high-performance computing platforms, and executing and monitoring the sets 
of data-parallel jobs that comprise the astronomical processing. It interfaces 
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with the DESDM Data Access Framework for efficient and rehable transfer of 
files and image data to compute platforms. 

2.1. Application Modules 

DESDM processing pipelines are constructed with a Java middleware layer pro- 
vided by the Elf/OgreScrip10 software developed at the NCSA. Elf is an ro- 
bust application container, and OgreScript is a workflow scripting language with 
scripts encoded in XML. The container middleware wraps the astronomical mod- 
ules to be executed on the compute nodes of a target machine, for example, a 
TeraGrid cluster. Elf/OgreScript allows important status and quality assurance 
information to be issued within events from running applications. The container 
middleware supports the parsing of the streaming stdout and stderr from ap- 
plication science codes, and by this mechanism astronomy codes can send vital 
information and updates through the events system. The events are sent to 
remote Notification services, which gather the events from distributed processes 
into a central repository, 

A single Elf/OgreScript execution may be composed of a sequence of ap- 
plication modules. New codes are easily added into processing pipelines by 
writing simple module descriptions with information such as the executable to 
be launched, command line arguments, and descriptions of input lists or files. 

2.2. Orchestration 

The orchestration layer is written in Perl and utilizes the Perl DBI module to 
abstract its interactions with the database persistence layer. This will enable 
transparent access to different database types (Oracle, MySQL, PostgreSQL, 
etc.). Perl scripts of the orchestration layer serve as wrappers for various Con- 
doio (Thain, Tannenbaum, & Livny 2005) commands, converting between more 
customized scientific inputs/outputs and the general Condor inputs/outputs. 
For each data-parallel block of codes, the Orchestration layer must perform sev- 
eral tasks. First it queries the central database to get a list of input images and 
divides that into sublists for the data-parallel jobs. It stages the input images 
to the target machine. The orchestration then creates the Elf/ OgreScript XML 
scripts and properties files for the jobs and stages these files and the input lists 
to the target machine. Then the orchestration submits the pipeline jobs using 
vanilla Condor jobs if submitting to the local condor pool or using Condor-G 
jobs to remote target machines such as the TeraGrid cluster. To automatically 
control the sequence of these steps through the entire image processing, the 
orchestration uses Condor's Directed Acyclic Graph Manager (DAGMan). 

Orchestration uses the Data Access Framework (DAF) to stage files in mul- 
tiple cases. The DAF is a set of programs that can also be used by operators 
to transfer files while updating the central database. There are locally created 
input lists and files that need to be copied to the target machine. Sometimes the 
images need to be copied from other archive locations. And after processing, the 
newly created images and files can be backed up to other archive locations. The 
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processing framework also uses the centralized database to create input lists of 
images and metadata for the target jobs. 

3. Results 

The processing framework has been used to process several pipelines on simu- 
lated DES data as well as real data from the Blanco Cosmology Survey (BCS). 
An earlier version was used to reduce 45 nights of simulated DECam supernova 
imaging data. In Data Challenge 4 (DC4), which finished at the end of January, 
2009, the processing framework was used to process three different astronomy 
pipelines for almost the entire DC4 data. These include the nightly processing 
pipeline, coaddition pipeline and Weak lensing pipeline. The nitely processing 
pipeline involves crosstalk corrections, detrending, astrometric refinement, fol- 
lowed by remapping and catalog ingestion. The nitely processing was run on 10 
nights of simulated DES data, out of which 3 were nonphotometric. The coad- 
dition and weak-lensing pipelines were processed on a tile-by-tile basis where 
the sky was divided into nonoverlapping rectangular regions called tiles. The 
coaddition and weak lensing pipelines were run on about 250 tiles for simu- 
lated DC4 data. The coaddition pipeline involves combining multiple images of 
the same region of sky and different bands into multiple images. Weak lensing 
pipeline involves identification of bright stars which are useful for PSF descrip- 
tion, measurement of shapelet decomposition of stars and estimation of shear of 
deconvolved galaxies. In addition, we also developed the PSF homogenization 
and difference imaging pipeline which was tested on a small subset of DC4 data. 
More details of these pipelines can be found in Mohr et al (2008). 
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